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Abstract — Existing metliods for sparse cliannel estimation ty- 
pically provide an estimate computed as the solution maximizing 
an objective function defined as the sum of the log-likelihood 
function and a penalization term proportional to the i?i-norm of 
the parameter of interest. However, other penalization terms have 
proven to have strong sparsity-inducing properties. In this work, 
we design pilot-assisted channel estimators for OFDM wireless 
receivers within the framework of sparse Bayesian learning 
by defining hierarchical Bayesian prior models that lead to 
sparsity-inducing penalization terms. The estimators result as 
an application of the variational message-passing algorithm on 
the factor graph representing the signal model extended with 
the hierarchical prior models. Numerical results demonstrate the 
superior performance of our channel estimators as compared to 
traditional and state-of-the-art sparse methods. 

I. Introduction 

During the last few years the research on compressive 
sensing techniques and sparse signal representations |[T], ||2l 
applied to channel estimation has received considerable atten- 
tion, see e.g., [2J-|7|. The reason is that, typically, the impulse 
response of the wireless channel has a few dominant multipath 
components. A channel exhibiting this property is said to be 
sparse |3|. 

The general goal of sparse signal representations from 
overcomplete dictionaries is to estimate the sparse vector a 
in the following system model: 

y = + w. (1) 

In this expression y E C*^ is the vector of measurement sam- 
ples and w E C*^ represents the samples of the additive white 
Gaussian random noise with covariance matrix A~^/ and 
precision parameter A > 0. The matrix $ — [<pi, . . . , cpj^] E 
(^MxL j.jjg overcomplete dictionary with more columns than 
rows (L > M) and a = [ai, . . . , a^f^ E is an unknown 
sparse vector, i.e., a has few nonzero elements at unknown 
locations. 

Often, a sparse channel estimator is constructed by solving 
the ^i-norm constrained quadratic optimization problem, see 
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among others PJ-JS): 

a = argmin {||y — $a||2 + k||q:||i} (2) 

with K > and || • p > I, denoting the £p vector norm. 
This method is also known as Least Absolute Shrinkage and 
Selection Operator (LASSO) regression ||8] or Basis Pursuit 
Denoising 1^]. The popularity of the LASSO regression is 
mainly attributed to the convexity of the cost function, as well 
as to its provable sparsity-inducing properties (see ||2]). In 
|6| the LASSO regression is applied to orthogonal frequency- 
division multiplexing (OFDM) pilot-assisted channel estima- 
tion. Various channel estimation algorithms that minimize the 
LASSO cost function using convex optimization are compared 
in |6|. 

Another approach to sparse channel estimation is sparse 
Bayesian learning (SBL) Q, |[l0l-fT2l. Specifically, SBL aims 
at finding a sparse maximum a posteriori (MAP) estimate of 
a. 

S = argmin — $Q:||2 + A^^Q(q:)} (3) 

OL 

by specifying a prior p{(y) such that the penalty term Q{<y) oc*^ 
— logp(Q:) induces a sparse estimate SQ 

Obviously, by comparing (|2]l and (|3]l the SBL framework 
realizes the LASSO cost function by choosing the Laplace 
prior p(a) c>c exp(— a||a||i) with n = X^^a. However, instead 
of working directly with the prior p{<y), SBL models this 
using a two-layer (2-L) hierarchical structure. This involves 
specifying a conditional prior p{a.\'^) and a hyperprior ^(7) 
such that p{a.) ~ J p{a\-f)p{-f)d'y has a sparsity-inducing 
nature. The hierarchical approach to the representation of p{a) 
has several important advantages. First of all, one is free to 
choose simple and analytically tractable probability density 
functions (pdfs). Second, when carefully chosen, the resulting 
hierarchical structure allows for the construction of efficient 
yet computationally tractable iterative inference algorithms 
with analytical derivation of the inference expressions. 

In iTOl we propose a 2-L and a three-layer (3-L) prior 
model for a.. These hierarchical prior models lead to novel 

'Here x oc*^ y denotes cxp(x) = cxp(»j) cxp(j/), and thus x = v + y, for 
some arbitrary constant v. We will also make use of x oc 1/ which denotes 
X = vy for some positive constant v. 



sparsity-inducing priors that include the Laplace prior for 
complex variables as a special case. This paper adapts the 
Bayesian probabilistic framework introduced in [13^| to OFDM 
pilot-assisted sparse channel estimation. We then propose a 
variational message passing (VMP) algorithm that effectively 
exploits the hierarchical structure of the prior models. This 
approach leads to novel channel estimators that make use 
of various priors with strong sparsity-inducing properties. 
The numerical results reveal the promising potential of our 
estimators with improved performance as compared to state- 
of-the-art methods. In particular, the estimators outperform 
LASSO. 

Throughout the paper we shall make use of the following 
notation: (•)^ and (•)^ denote respectively the transpose and 
the Hermitian transpose; the expression (/(a;))g(^) denotes 
the expectation of the function f{x) with respect to the 
density q{x); CN{x\a, B) denotes a multivariate complex 
Gaussian pdf with mean a and covariance matrix B; similarly, 
Ga(x|a, 6) — 5=^2:""^ exp(— 6x) denotes a Gamma pdf with 
shape parameter a and rate parameter b. 

II. Signal Model 

We consider a single-input single-output OFDM system 
with subcarriers. A cyclic prefix (CP) is added to pre- 
serve orthogonality between subcarriers and to eliminate inter- 
symbol interference between consecutive OFDM symbols. 
The channel is assumed static during the transmission of 
each OFDM symbol. The received (baseband) OFDM signal 



is modeled as a sum of multipath components: 



r e 



reads in matrix-vector notation 
r = Xh + n. 



(4) 



The diagonal matrix X = diag(a;i,X2, . . . ,xn) contains the 
transmitted symbols. The components of the vector h e 
are the samples of the channel frequency response at the N 
subcamers. Finally, n E is a zero-mean complex symme- 
tric Gaussian random vector of independent components with 
variance A~^. 

To estimate the vector h in (|4|i, a total of M pilot symbols 
are transmitted at selected subcarriers. The pilot pattern V C 
{1, . . . , A^} denotes the set of indices of the pilot subcarriers. 
The received signals observed at the pilot positions r-p are 
then divided each by the corresponding pilot symbol X-p = 
diag(a;„ : n E V) to produce the vector of observations: 



y^iXr)-'rr^h■p + {X^ 



n-p. 



(5) 



We assume that all pilot symbols hold unit power such that 
the statistics of the noise term {X-p)^^np remain unchanged. 



Le., y e 



yields the samples of the true channel frequency 



response (at the pilot subcarriers) corrupted by additive com- 
plex white Gaussian noise with component variance A^^. 

In this work, we consider a frequency-selective wireless 
channel that remains constant during the transmission of 
each OFDM symbol. The maximum relative delay Tmax is 
assumed to be large compared to the sampling time Tg, i.e., 
Tm-dx/Ts ^ 1 [3 1. The impulse response of the wireless channel 



gir) 



K 

E 

k=l 



(6) 



In this expression, (3^ and are respectively the complex 
weight and the continuous delay of the fcth multipath com- 
ponent, and (5( ) is the Dirac delta function. The parameter 
K is the total number of multipath components. The channel 
parameters K, f3k, and r^, k = 1, . . . ,K, are random vari- 
ables. Specifically, the weights (3k, k = 1, . . . , K, are mutually 
uncorrected zero-mean with the sum of their variances nor- 
malized to one. Additional details regarding the assumptions 
on the model (|6]l are provided in Section [VT] 

III. The Dictionary Matrix 

Our goal is to estimate /i in (IHi by applying the general 
optimization problem (O to the observation model (|5). For 
doing so, we must define a proper dictionary matrix ^. In this 
section we give an example of such a matrix. As a starting 
point, we invoke the parametric model (|6]l of the channel. 
Making use of this model, (|5]l can be written as 



y = T{t)(3 + w 



(7) 



with hv = T{t)(3, w = (Xp)-inp, (3 = [Pi, . . . , Pk]^ , 
T = [ti, . . . , tk]^, and T{t) E C^'^'^ depending on the pilot 
pattern V as well as the unknown delays in t. Specifically, 
the (to, fc)th entry of T{t) reads 



T{T)m.k = exp {-j2TrfmTk) , 



TO= 1,2,.. 

fc-1,2,.. 



,M 
,K 



(8) 



with fm denoting the frequency of the mth pilot subcarrier 
In the general optimization problem (|3]l the columns of $ 
are known. However, the columns of T{t) in (|7]i depend on 
the unknown delays in t. To circumvent this discrepancy we 
follow the same approach as in |5| and consider a grid of 
uniformly-spaced delay samples in the interval [0, Tmax]: 



— — ^ 

' C c '■ 



(9) 



with C > such that C'^'max/^s is an integer. We now define 
the dictionary * E C^'^^^ as * = T{Td). Thus, the entries 
of $ are of the form (|8) with delay vector Td- The number 
of columns L — C''"max/7s + 1 in $ is thereby inversely 
proportional to the selected delay resolution Ts/(. 

It is important to notice that the system model ([TJ with $ 
defined using discretized delay components is an approxima- 
tion of the true system model This approximation model 
is introduced so that (|3]l can be applied to solve the channel 
estimation task. The estimate of the channel vector at the pilot 
subcarriers is then h-p ~ In order to estimate the channel 

in © the dictionary $ is appropriately expanded (row-wise) 
to include all N subcarrier frequencies. 

IV. Bayesian Prior Modeling 

In this section we specify the joint pdf of the system model 
([T]i when it is augmented with the 2-L and the 3-L hierarchical 



(a) (b) 

Fig. 1. 2-L hierarchical prior pdf for a g C^: (a) Contour plot of 
the restiiction to the Im{cJi} = Im{«2} = - plane of the penalty 
term Q{ai,a2;e,ri) oc"^ — log(p{oi ; e, r))p(a2 ; »)))■ (b) Restriction to 
Im{</)p3/} = of the resulting MAP estimation rule (3) with e as a parameter 
in the case when ^ is orthonormal. The black dashed line indicates the hard- 
threshold rale and the black solid line the soft-threshold rule (obtained with 
e = 3/2). The black dashed line indicates the penalty term resulting when 
the prior pdf is a circular symmetric Gaussian pdf. 



(a) (b) 

Fig. 2. Three-layer hierarchical prior pdf for ct G with the setting 
a = 1, b = 0.1: (a) Restriction to Im{</)pj/} = of the resulting 
MAP estimation rale (3) with e as a parameter in the case when # is 
orthonormal. The black dashed line indicates the hard-threshold rale and the 
black solid hne the soft-threshold rale, (b) Contour plot of the restiiction to the 
Im{ai} = Im{a2} = - plane of the penalty term Q{qi, a2; e, a,b) oc" 
— log(p(ai ; e, a, b)p{a2 ; e, a, b)). 



prior model. The joint pdf of ([T) augmented with the 2-L 
hierarchical prior model reads 

p{y, a, 7, A) ^ p{y\oi, X)p{X)p{a\-f)p{'y; rj). (10) 

The 3-L prior model considers the parameter ij specifying 
the prior of 7 in ( fTOl i as random. Thus, the joint pdf of (UJ 
augmented with this hierarchical prior model is of the form 

p(y,a,7,r;, A) ^ p{y\a, X)p{X)p{a\'f)p{'y\r])p{q). (11) 

In ([T0| and ([IB we have p{y\a,X) = CN(y|#a, A^^ J) 
due to ([T]i- Furthermore, we select the conjugate prior 
p{X) = p{X;c,d) = Ga(A|c, d). Finally, we let p{a\'j) = 
YliLiPic^ihi) with p{ai\ji) = CN(ai|0,7i). In the following 
we show the main results and properties of these prior models. 
We refer to f\3\ for a more detailed analysis. 

A. Two-Layer Hierarchical Prior Model 

The 2-L prior model assumes that ^(7) — Y[f=iPili) with 
pill) = p{^ii;e,rii) - Ga(7; |e, 77;). We compute the prior of 
a. to be 

/•oo L 

p{a;e,T])^ / p{a\j)p{j; e, T])dj = T\p{ai;e,r]i) (12) 

•^0 1=1 

with 

2 (■=+!) 

p{ai;€,'qi) ^ ^ K,^i{2y^i\ai\). (13) 

TTl (ej 

In this expression, Kiy{-) is the modified Bessel function of 
the second kind with order e R. The prior (fTjt leads to the 
general optimization problem (|3} with penalty term 

L 

Q{a;e,ri)^J2^og{\air^K,_i{2y^i\ai\)). (14) 

1=1 

We now show that the 2-L prior model induces the £1- 
norm penalty term and thereby the LASSO cost function as 
a special case. Selecting e = 3/2 and using the identity 



Ki{z) ~ y^^exp(— z) lfT4l . ( |T3] l yields the Laplace prior 

2rii 

p{ai;€ = 3/2, 7]i) = — exp(-2^|a;|). (15) 

TT 

With the selection rji = r/, I = 1, . . . , L, we obtain Q{a.; rj) = 
207||a||i. 

The prior pdf ([13) is specified by e and the regularization 
parameter rj. In order to get insight into the impact of e 
on the properties of this prior pdf we consider the case 
a e C^. In Fig. |l(a)| the contour lines of the restriction 
to of Q{ai,a2;e,ri) cx*^ —\og{p{ai;e,r])p{a2;e,ri)) are 
visualized|3 each contour line is computed for a specific choice 
of e. Notice that as e decreases towards more probability 
mass accumulates along the a-axes; as a consequence, the 
mode of the resulting posterior is more likely to be located 
close to the axes, thus promoting a sparse solution. The 
behavior of the classical £1 penalty term obtained for e = 3/2 
can also be clearly recognized. In Fig. |l(b)| we consider the 
case when $ is orthonormal and compute the MAP estimator 
^ with penalty term ([l4l i for different values of e. Note 
the typical soft-threshold-like behavior of the estimators. As 
e — > 0, more components of S are pulled towards zero 
since the threshold value increases, thus encouraging a sparser 
solution. 



B. Three-Layer Hierarchical Prior Model 

We now turn to the SBL problem with a 3-L prior model for 
a leading to the joint pdf in (TT[ . Specifically, the goal is to 
incorporate the regularization parameter rj into the inference 
framework. To that end, we define p{ri) = Yif Pivi) with 
Pivi) = P{'ni;ai,bi) = Ga.{'rii\ai,bi) and compute the prior 
p{a.). Defining a = [ai, . . . , ai]'^ and b = [61, ... , Bl]^ we 

-Let / denote a function defined on a set A. The restriction of / to a subset 
B C A is the function defined on B that coincides with / on this subset. 




Fig. 3. A factor graph that represents the joint pdf 1 11 11 . In this figure 
fy = P(3/l", A), fa = p(q;|7). /-y = Piflv)^ fr, = p{v)- and = p(A). 



obtain p(a; e, a, 6) = J|j^ p(Q!;; e, a;, with 



p{ai;e,ai,bi) = / |7;)p(7/)d7/ 
Jo 

r(e + a/)r(a, + l) 



TTbiT{e)r{ai) 



U [e + ai;e; 



(16) 



In this expression, [/(•;•;•) is the confluent hypergeometric 
function llT4l . In Fig. |2(a)| we show the estimation rules 
produced by the MAP solver for different values of e and 
fixed parameters a/ and bi when $ is orthonormal. It can 
be seen that the estimation rules obtained with the 3-L prior 
model approximate the hard-thresholding rule. In Fig. |2(b)[ 
we depict the contour lines of the restriction to of 
Q(ai,a2;e,a, 6) ex" - log(p(ai; e, a, 6)p(a2; e, a, &)). Ob- 
serve that although the contours behave qualitatively similarly 
to those shown in Fig. l(a)| for the 2-L prior model, the 
estimation rules in Fig. |2(a) and Fig. |l(b)| are different. 

Naturally, the 3-L prior model encompasses three free 
parameters, e, a, and b. The choice e = and 6/ small 
(practically we let 6/ = 10^^, I — 1,...,L) induces a 
weighted log-sum penalization term. This term is known to 
strongly promote a sparse estimate fW], fTTj. Later in the 
text we will also adopt this parameter setting. 

V. Variational Message Passing 

In this section we present a VMP algorithm for estimating 
ft. in (HI given the observation y in (|5]l. Let © — {a, 7, t], A} 
be the set of unknown parameters and p{y, 0) be the joint 
pdf specified in ( fTTl i. The factor graph ifTSi that encodes 
the factorization of p{y, & ) is shown in Fig. |3] Consider an 
auxiliary pdf q{&) for the unknown parameters that factorizes 
according to q{&) = q{a)q{'y)q{ri)q{X). The VMP algorithm 
is an iterative scheme that attempts to compute the auxiliary 
pdf that minimizes the KuUback-Leibler (KL) divergence 
KL((7(0)||p(0|y)). In the following we summarize the key 
steps of the algorithm; the reader is referred to [.16i for more 
information on VMP. 

From 1 16] the auxiliary function q{9i), 9i G 0, is updated 
as the product of incoming messages from the neighboring 
factor nodes /„ to the variable node Of. 

g(6/,)cx n "^/..^e.- (17) 

In ( [TtI i Afe- is the set of factor nodes neighboring the variable 
node 9i and nif^^g. denotes the message from factor node 



/„ to variable node 9i. This message is computed as 



= exp ( (ln/n)n, g(e,), e,eAf^„\{e,} ) ' 



(18) 



where A//„ is the set of variable nodes neighboring the 
factor node /„. After an initialization procedure, the individual 
factors of q{&) are then updated iteratively in a round-robin 
fashion using ( [TtT i and (fTsT l. 

We provide two versions of the VMP algorithm: one applied 
to the 2-L prior model (referred to as VMP-2L) and another 
one applied to the 3-L model (VMP-3L). The messages 
corresponding to VMP-2L are easily obtained as a special 
case of the messages computed for VMP-3L by assuming 
q{rii) = S{rii — rji), where r/( is some fixed real number 

1) Update of q{cy.): According to STl\ and Fig. [3] the 
computation of the update of g(a) requires evaluating the 
product of messages mf^^a and ruf^^a- Multiplying these 
two messages yields the Gaussian auxiliary pdf q{a.) — 
CN ^ajo:, Sq,^ with covariance matrix and mean given by 

S„ = ((A>,(A)*"* + V(7))"\ (19) 
a = = (A),(;,)S„$"y. (20) 

In the above expression we have defined ^(7) = 
diag((7r^)g(-y),---,(7Z^>g(7))- 

2) Update of q{'y): The update of q{'j) is proportional to 
the product of the messages mf^^-y and irif^^^: 

L 

lil) «n^r^exp(-7f^(|aip)q(„) -Ji{m)qir,)) ■ (21) 
1=1 

The right-hand side expression in (I2II is recognized as the 
product of Generalized Inverse Gaussian (GIG) pdfs iflTl with 
order p = e — 1. Observe that the computation of ^(7) in (fT9] ) 
requires evaluating (7;^)<;(-y) for alH = 1, . . . , L. Luckily, the 
moments of the GIG distribution are given in closed form for 
any n e K iHl: 



<2(t) 



Kp {'^^/{m)q(r,){\ai\^)q(a)) 



(22) 



3) Update of q{ri): The update of q{ri) is proportional to 
the product of messages mf^^ri and to/^^t,: 

L 

q{v) (X n'7r"'''exp(-((7,),(^) +60'70 ■ (23) 



(=1 



Clearly, q{ri) factorizes as a product of L gamma pdfs, one 
for each individual entry in rj. The first moment of rji used in 
is easily computed as 



im)q(v) 



ai 



(24) 



Naturally, 17(77) is only computed for VMP-3L. 

4) Update of q{X): It can be shown that q{X) = Ga(A|M + 
c, (II y — €>Q;||2)g(a) +d). The first moment of A used in (fT9l ) 




Fig. 4. Comparison of the performance of the VMP-2L, VMP-3L, RWF, RVM, and SparseRSA algorithms: (a) BER versus Et/No, (b) MSB versus Et/No, 
(c) MSB versus number of available pilots M with fixed L = 200 and the ratio between received symbol power and noise variance set to 15 dB. In (a,b) we 
have M = 100 and L = 200. In (a) the dashed line shows the BER performance when the true channel vector h. in (4) is known. 



TABLE I 

Parameter settings for the simulations. The convolutional 
code and decoder has been implemented using iftsl . 



Sampling time, Tg 


32.55 ns 


CP length 


4.69 /IS / 144 Ts 


Subcamer spacing 


15 kHz 


Pilot pattern 


Equally spaced, QPSK 


Modulation 


QPSK 


Subcarriers, N 


1200 


Pilots, M 


100 


OFDM symbols 


1 


Information bits 


727 


Channel interleaver 


Random 


Convolutional code 


(133,171,165)8 


Decoder 


BCJR algorithm IHl 



and ( l20b is therefore 



M - 



c 



y - *Q;||?)„r„^ + d' 



(25) 



2/g(a) 'T 

VI. Numerical Results 



We perform Monte Carlo simulations to evaluate the per- 
formance of the two versions of the derived VMP algorithm 
in Section [V] We consider a scenario inspired by the 3GPP 
LTE standard |20| with the settings specified in Table ID The 
multipath channel (|6]l is based on the model used in ET\ 
where, for each realization of the channel, the total number 
of multipath components K is Poisson distributed with mean 
of {K)p(x) 10 and the delays r^, k — 1,...^K, are 
independent and uniformly distributed random variables drawn 
from the continuous interval [0, 144 Tg] (corresponding to the 
CP length). The fcth nonzero component conditioned on the 
delay Tfc has a zero-mean complex circular symmetric Gaus- 
sian distribution with variance (T^(ta;) — {\l3k\'^)p{fik\rk) ~ 
wexp(— Tfe/w) and parameters u,v > 00 

To initialize the VMP algorithm we set {)^)q(\) and 



(7^ 



equal to the inverse of the sample variance of y and 



'The parameter u is computed such that (Xli^i 



(/3,-r,A') 



1, 



where p(/3, t, K) is the joint pdf of the parameters of the channel model. 
In the considered simulation scenario, (^)p{if) = 10, Tmax = 144 Tg, and 
V = 20Ts (the decay rate). 



the inverse number of columns L respectively. Furthermore, 
we let c = d = in ( |25] |. which corresponds to the 
Jeffreys noninformative prior for A. Once the initialization 
is completed, the algorithm sequentially updates the auxiliary 
pdfs q{<y), q{j), qiv)^ ^nd q{X) until convergence is achieved. 
Obviously, q{r]) is only updated for VMP-3L, whereas for 
VMP-2L the entries in rj are set to M. For both versions we 
select e = and for VMP-3L we set a; = 1 and bi = lO"*^, 
I = 1, . . . , L. Finally, the dictionary $ is specified by M pilot 
subcarriers and a total of L = 200 columns (corresponding to 
the choice Tmax = 144 Tg and C « 1-4 in dgj). 

The VMP is compared to a classical OFDM channel estima- 
tor and two state-of-the-art sparse estimation schemes. Specifi- 
cally, we use as benchmark the robustly-designed Wiener Filter 
(RWF) r22l, the relevance vector machine (RVM) lITOl, ifTTlFI 
and the sparse reconstruction by separable approximation 
(SpaRSA) algorithm [23 lH The RVM is an EM algorithm 
based on the 2-L prior model of the student-t pdf over 
each ai, whereas SpaRSA is a proximal gradient method for 
solving (|2]l. In case of the SpaRSA algorithm the regularization 
parameter k needs to be set. In all simulations, we let k = 2, 
which leads to good performance in high signal-to-noise ratio 
(SNR) regime. 

The performance is compared with respect to the resulting 
bit-error-rate (BER) and mean-squared error (MSE) of the 
estimate h versus the SNR (Eb/No). In addition, in order to 
quantify the necessary pilot overhead, we evaluate the MSE 
versus the number of available pilots M. Hence, in this setup 
M is no longer fixed as in Table |T] 

In Fig. |4(a)| we compare the BER performance of the 
different schemes. We see that VMP-3L outperforms the other 
schemes across all the SNR range considered. Specifically, at 
1 % BER the gain is approximately 2 dB compared to VMP- 
2L and RVM and 3 dB compared to SpaRSA and RWF Also 
VMP-2L achieves lower BER in the SNR range - 12 dB 
compared to RVM and across the whole SNR range compared 
to SpaRSA and RWF 

The superior BER performance of the VMP algorithm is 
well reflected in the MSE performance shown in Fig. |4(b)| 



''The software is available on-line at 
'The software is available on-line at 



http://dsp.ucsd.edu/~dwipf/| 



' http://www.lx.it.pt/~mtf/SpaRS A/| 



Again VMP-3L is a clear winner followed by VMP-2L. The 
bad MSE performance of the SpaRSA for low SNR is due to 
the difficulty in specifying a suitable regularization parameter 
K across a large SNR range. 

We next fix the ratio between received symbol power 
and noise variance to 15 dB|f| and evaluate the MSE versus 
number of available pilots M. The results are depicted in 
Fig. |4(c)| Observe a noticeable performance gain obtained 
with VMP-3L. In particular, VMP-3L exhibits the same MSE 
performance as VMP-2L and RVM using only approximately 
85 pilots, roughly half as many as VMP-2L and RVM. Fur- 
thermore, VMP-3L, using this number of pilots, significantly 
outperforms SpaRSA and RWF using 200 pilots. 

VII. Conclusion 

In this paper, we proposed channel estimators based on 
sparse Bayesian learning. The estimators rely on Bayesian 
hierarchical prior modeling and variational message passing 
(VMP). The VMP algorithm effectively exploits the proba- 
bilistic structure of the hierarchical prior models and the result- 
ing sparsity-inducing priors. Our numerical results show that 
the proposed channel estimators yield superior performance in 
terms of bit-error-rate and mean-squared error as compared to 
other existing estimators, including the estimator based on the 
£i-norm constraint. They also allow for a significant reduction 
of the amount of pilot subcarriers needed for estimating a given 
channel. 
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^ Note that this value does not correspond with Ei,/No as represented in 
Fig. |4(a)| and |4(b)] The specific Ei,/No depends on the number of bits in an 
OFDM block, which in tum depends on the number of pilot symbols _A/. 



